EXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations

نویسندگان

Yung-Lin Liu

Chung-Ta King

چکیده

Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very diicult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation scheduling. In this paper, we introduce the EXPLORER system for runtime paralleliza-tion of DOACROSS and DOALL loops on general NOWs. EXPLORER hides the communication overhead on NOWs through multithreading | a facility supported in almost all workstations. A preliminary version of EXPLORER was implemented on a NOW consisting of eight DEC Alpha workstations connected through an Ethernet. The Pthread package was used to support multithreading. Experiments on synthetic loops showed speedups of up to 6.5 in DOACROSS loops and 7 in DOALL Loops.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effects of Parallelism Degree on Run-Time Parallelization of Loops

Due to the overhead for exploiting and managing parallelism, run-time loop parallelization techniques with the aim of maximizing parallelism may not necessarily lead to the best performance. In this paper, we present two parallelization techniques that exploit different degrees of parallelism for loops with dynamic crossiteration dependences. The DOALL approach exploits iterationlevel paralleli...

متن کامل

Local predecimation with range index communication parallelization strategy for fractal image compression on a cluster of workstations

In this paper, we have implemented and evaluated the performance of local predecimation with range index communication parallelization strategy for fractal image compression on a beowulf cluster of workstations. The strategy effectively balances the load among workstations. We have evaluated the execution time of LPRI, varying the number of workstations and user-specified root mean square error...

متن کامل

Affine Transformations for Communication Minimized Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences

A long running program often spends most of its time in nested loops. The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses for parallel execution. Affine transformations in this model capture a complex sequence of execution-reordering loop transformations that improve performance by parallelization as well as better locality. Although a significant am...

متن کامل

The LRPD Test: Speculative Run–Time Parallelization of Loops with Privatization and Reduction Parallelization

Current parallelizing compilers cannot identify a significant fraction of parallelizable loops because they have complex or statically insufficiently defined access patterns. As parallelizable loops arise frequently in practice, we advocate a novel framework for their identification: speculatively execute the loop as a doall, and apply a fully parallel data dependence test to determine if it ha...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

EXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations

نویسندگان

چکیده

منابع مشابه

Effects of Parallelism Degree on Run-Time Parallelization of Loops

Local predecimation with range index communication parallelization strategy for fractal image compression on a cluster of workstations

Affine Transformations for Communication Minimized Parallelization and Locality Optimization of Arbitrarily Nested Loop Sequences

The LRPD Test: Speculative Run–Time Parallelization of Loops with Privatization and Reduction Parallelization

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

عنوان ژورنال:

اشتراک گذاری